Theoretical Ecology
(EEMB 595TE) Spring 2018
Class 6: Multi-state mark-recapture models
To update the course repository on your computer, you will pull a current copy of the repository. To do this:
Open your terminal/bash.
Navigate to the course repository. If this is in your root directory then type:
cd
cd EEMB595TEPaste the following into the terminal/bash:
git pullTo learn the general principles of what occurs mathematically when you indicate a random effect
To learn how to create a state-space model
Learn the differences between types of mark-recapture models
| Week | Date | Subject | Book.chapters |
|---|---|---|---|
| 1 | 5-Apr | Course outline & Github | — |
| 2 | 12-Apr | Bayesian vs. Frequentist | Chpt 1,2 |
| 3 | 19-Apr | GLM in R and JAGS | Chpt 3 |
| 4 | 26-Apr | Random effects & State-space models | Chpt 4, 5 |
| 5 | 3-May | Random effects & State-space models | Chpt 4, 5 |
| 6 | 10-May | Analyzing mark-recapture data (JS models) | Chpt 10 |
| 7 | 17-May | Occupancy & N-mixture models | Chpt 13 |
| 8 | 24-May | Student presentations | — |
| 9 | 31-May | Student presentations | — |
| 10 | 7-Jun | Student presentations | — |
Everyone should have their presentations/code ready for 24 May 2018 Either put it in a google drive file, drobox, box, or thumbdrive
Only 3 people have signed up- please do so as soon as possible. Presentations will start in 2 weeks from today.
The example listed here are not all inclusive:
Age structure: baby, juvenile, adult, dead
Disease: infected, not infected, dead
Movement: location A, location B, dead
Sexual maturity:
Fruiting adult, not fruiting adult, dead
Breeding adult, not breeding adult, dead
Plant state: vegetative, flowering, dormant
These are the two most popular mark-recapture models are:
Cormack-Jolly-Seber Model (CJS)
Jolly-Seber Model (JS)
There are others:
What parameters can you estimate using these two models?
| Cormack-Jolly-Seber Model | Jolly-Seber |
|---|---|
| Survival (\(\phi_{1}\)) | Survival (\(\phi_{1}\)) |
| Survival (\(\phi_{2}\)) | Survival (\(\phi_{2}\)) |
| Transistion probability (\(\psi_{1,2}\)) | Transistion probability (\(\psi_{1,2}\)) |
| Transistion probability (\(\psi_{2,1}\)) | Transistion probability (\(\psi_{2,1}\)) |
| Recruitment (\(\gamma_{1}\)) | |
| Recruitment (\(\gamma_{2}\)) | |
| Abundance (N) |
What is the main difference between the CJS and JS models?
The Cormack-Jolly-Seber model disregards the data prior to the first encounter:
So it can not estimate recruitment
y = [0, 0, 0, 1, 1, 0, 0]
The model will start using information from the fourth season when the individual was captured (0 = not seen; 1 = seen)
The Jolly-Seber model uses the entire capture history:
So it can not estimate recruitment
y = [0, 0, 0, 1, 1, 0, 0]
The model will take advantage of the entire capture history- starting with the first season
The transition and observation probabilities must be the same for all individuals at a given occasion and state
Individuals must be independent from one another
Individuals and states are recorded without error
No marks are lost
Same as the Cormack-Jolly-Seber model plus 2 more
The captured individuals are a random sample of the population
All individuals (marked and unmarked) in the population have the same capture probability
State process: Unobserved (truth)
Observed process: Observed (with sampling error)
Where y = 1 (seen) and y = 0 (not seen)
The observations for one individual would be:
y = [1, 1, 0, 1, 0, 0, 0]
We need to determine if the individual for the last 3 sampling periods was dead or alive and not seen
We monitor several individuals so the data we collect will look like this:
## Season1 Season2 Season3 Season4 Season5
## 1 1 0 1 0 0
## 2 1 0 1 1 1
## 3 0 0 1 0 0
## 4 0 0 1 1 1
## 5 0 1 1 0 0
## 6 1 1 1 1 1
## 7 0 0 1 0 0
## 8 0 1 1 1 0
## 9 0 0 0 1 0
## 10 0 1 0 1 0
State process: Unobserved (truth)
Observed process: Observed (with sampling error)
Where y = 1 (seen in state 1), y = 2 (seen in state 2), and y = 0 (not seen)
You define and categorize what does state 1 equal and what does state 2 equal
The observations for one individual would be:
y = [1, 2, 0, 1, 0, 0, 0]
We need to determine if the individual for the last 3 sampling periods was dead or alive and not seen
We monitor several individuals so the data we collect will look like this:
## Season1 Season2 Season3 Season4 Season5
## 1 1 1 0 1 2
## 2 1 0 0 0 0
## 3 0 1 2 1 1
## 4 1 0 0 0 1
## 5 1 1 0 2 0
## 6 0 0 1 0 0
## 7 0 0 1 0 1
## 8 0 2 0 1 1
## 9 2 2 0 1 0
## 10 1 0 2 0 1
We are interested in disease state of the chytrid fungus (infected or not infected) for amphibians.
Data collection methods:
We go out 1 night for 10 years
We give individuals unique IDs (using toe clipping method)
We collect skin swab samples from individuals to determine in the lab the disease state
We denote the true but, unknowable, disease state of individual i at occasion t as \(z_{i,t}\), where \(z_{i,t} = 1\ldots4\) and represents the states “not entered”, “uninfected”, “infected”, or “dead”, respectively. To estimate monthly survival \((\phi)\), recruitment \((\gamma)\), and transition \((\beta)\) rates, we used the transition matrix \(\psi\), where the cells represent the probabilities of moving from a row state to a column state from time t-1 to time t for individual i.
The “not entered” category consists of individuals that are not part of the population yet, where the parameters \(\gamma_{2}\) and \(\gamma_{3}\) are the state-specific entry probabilities for uninfected and infected hosts from t-1 to t, i.e., the probability that an individual in state i enters the population at time t. The parameter \(\phi_{2}\) and \(\phi_{3}\) are the state-specific survival probability for uninfected and infected hosts from t-1 to t, while the parameters \(\beta_{UI}\) and \(\beta_{IU}\) are the infection and recovery probabilities, respectively. In other words, if the individual i survives from t-1 to t, it can become infected if they were uninfected at t-1, or recover from infection if there were infected at time t-1, with probabilities \(\beta_{UI}\) and \(\beta_{IU}\). Each parameter must be between [0,1] and each group of parameters (recruitment, survival, and transition) must sum to 1. For survival and transition this is easy. However, for recruitment this is less straightforward. One solution is to choose a Dirichlet prior for \([\gamma_{1}, \gamma_{2}, \gamma_{3}]\) to ensure that these parameters sum to 1.
Because there are more than two possible true and observed states, we use the categorical distribution to model the transition from one state to another for individual i each time step:
\[z_{i,t} \mid z_{i,t-1} \sim \textrm{categorical}(\psi_{z_{i,t-1},1:4}),\]
where \(\psi\) is the transition matrix we just defined. Note that the argument of the categorical distribution is a vector of length 4; that is, it is the row of the matrix \(\psi\) corresponding to the state of individual i in time step t-1.
To estimate host recapture probabilities, we use a second transition matrix \(\pi\), which determines the probabilities of the possible observation outcomes (columns) for the true state of each captured individual (rows). We do not assume partial observations or state misclassifications and the observed states are “seen uninfected”, “seen infected”, and “not seen”.
We use the categorical to model the observed state \(y_{i,t}\) as a function of the true state:
\[y_{i,t} \mid z_{i,t} \sim \textrm{categorical}(\pi_{z_{i,t},1:3}),\]
where \(\pi\) is the observation matrix we just defined.. Similarly as before, the argument of the categorical distribution is a vector of length 3; that is, it is the row of the matrix \(\pi\) corresponding to the true state of individual i in time step t.
CH <- read.csv("/Users/Cici/EEMB595TE/6_Multi-state_Mark_Recapture_Models/Data/data_amphibian.csv")[,-1]
Look at the structure of the data and make sure it was imported correctly
str(CH)
## 'data.frame': 181 obs. of 5 variables:
## $ V1: int 0 1 1 0 1 0 1 1 1 0 ...
## $ V2: int 1 0 2 0 2 1 0 0 0 2 ...
## $ V3: int 1 0 0 0 0 0 0 2 0 0 ...
## $ V4: int 0 0 0 1 0 1 0 0 0 0 ...
## $ V5: int 1 0 0 2 0 1 0 0 0 0 ...
Remember our observations: - 0 = not seen - 1 = seen as uninfected - 2 = seen as infected
head(CH)
## V1 V2 V3 V4 V5
## 1 0 1 1 0 1
## 2 1 0 0 0 0
## 3 1 2 0 0 0
## 4 0 0 0 1 2
## 5 1 2 0 0 0
## 6 0 1 0 1 1
** Note that this code has a Bayesian posterior predictive check code and we will ignore it for now **
Quick note the Bayesian posteriod predictive check or also known as the Bayesian p-value is used to evaluate how well does your data match predictions from the model you created. Values near 0.5 = well fitting model; whereas values near 0.95 and 0.05 = poor fitting model
{
sink("model_JS.txt")
cat("
model {
#--------------------------------------
# Parameters:
# phi_U: survival probability of uninfected
# phi_I: survival probability of infected
# gamma_U: removal entry probability of uninfected
# gamma_I: removal entry probability of infected
# p_U: capture probability of uninfected
# p_I: capture probability of infected
#--------------------------------------
# States (S):
# 1 not yet entered
# 2 uninfected
# 3 infected
# 4 dead
# Observations (O):
# 1 seen as uninfected
# 2 seen as infected
# 3 not seen
#--------------------------------------
# Priors and constraints
phi_U ~ dunif(0, 1) # Prior for mean survival
phi_I ~ dunif(0, 1) # Prior for mean survival
for(t in 1:(K-1)){
gamma_U[t] ~ dunif(0, 1) # Prior for entry probabilities
gamma_I[t] ~ dunif(0, 1) # Prior for entry probabilities
}
p_U ~ dunif(0, 1) # Prior for mean capture
p_I ~ dunif(0, 1) # Prior for mean capture
beta_UI ~ dunif(0, 1)
beta_IU ~ dunif(0, 1)
# Define state and observation matricies
for(i in 1:M){
for(t in 1:(K-1)){
# Define probabilities of state S(t+1) given S(t)
# This is the psi matrix from above
ps[1,i,t,1] <- 1 - gamma_U[t] - gamma_I[t]
ps[1,i,t,2] <- gamma_U[t]
ps[1,i,t,3] <- gamma_I[t]
ps[1,i,t,4] <- 0
ps[2,i,t,1] <- 0
ps[2,i,t,2] <- phi_U * (1 - beta_UI)
ps[2,i,t,3] <- phi_U * beta_UI
ps[2,i,t,4] <- 1 - phi_U
ps[3,i,t,1] <- 0
ps[3,i,t,2] <- phi_I * beta_IU
ps[3,i,t,3] <- phi_I * (1 - beta_IU)
ps[3,i,t,4] <- 1 - phi_I
ps[4,i,t,1] <- 0
ps[4,i,t,2] <- 0
ps[4,i,t,3] <- 0
ps[4,i,t,4] <- 1
# This is the pi matrix from above
po[1,i,t,1] <- 0
po[1,i,t,2] <- 0
po[1,i,t,3] <- 1
po[2,i,t,1] <- p_U
po[2,i,t,2] <- 0
po[2,i,t,3] <- 1 - p_U
po[3,i,t,1] <- 0
po[3,i,t,2] <- p_I
po[3,i,t,3] <- 1 - p_I
po[4,i,t,1] <- 0
po[4,i,t,2] <- 0
po[4,i,t,3] <- 1
} #t
} #i
#------ Likelihood model
for (i in 1:M){
# Define latent state at first occasion
z[i, 1] <- 1
# Make sure that all M individuals are in state 1 at t = 1; not entered
for (t in 2:K){
# State process: draw S(t) given S(t-1)
z[i,t] ~ dcat(ps[z[i,t-1], i, t-1, ])
# Observation process: draw O(t) given S(t)
y[i,t] ~ dcat(po[z[i,t], i, t-1, ])
}
}
#------ Calculate derived population parameters fro recruitment
for (t in 1:(K-1)){
qgamma_U[t] <- 1-gamma_U[t]
qgamma_I[t] <- 1-gamma_I[t]
}
cprob_U[1] <- gamma_U[1]
cprob_I[1] <- gamma_I[1]
for (t in 2:(K-1)){
cprob_U[t] <- gamma_U[t] * prod(qgamma_U[1:(t-1)])
cprob_I[t] <- gamma_I[t] * prod(qgamma_I[1:(t-1)])
} #t
psi_U <- sum(cprob_U[]) # Inclusion probability
psi_I <- sum(cprob_I[]) # Inclusion probability
for (t in 1:(K-1)){
b_U[t] <- cprob_U[t] / psi_U # Entry probability
b_I[t] <- cprob_I[t] / psi_I # Entry probability
} #t
for (i in 1:M){
for (t in 2:K){
al_U[i,t-1] <- equals(z[i,t], 2)
al_I[i,t-1] <- equals(z[i,t], 3)
} #t
for (t in 1:(K-1)){
d_U[i,t] <- equals(z[i,t] - al_U[i,t], 0)
d_I[i,t] <- equals(z[i,t] - al_I[i,t], 0)
} #t
alive[i] <- sum(al_U[i,], al_I[i,])
} #i
for (t in 1:(K-1)){
N_U[t] <- sum(al_U[,t]) # Actual population size
N_I[t] <- sum(al_I[,t]) # Actual population size
B_U[t] <- sum(d_U[,t]) # Number of entries
B_I[t] <- sum(d_I[,t]) # Number of entries
} #t
for (i in 1:M){
w[i] <- 1-equals(alive[i],0)
} #i
Nsuper <- sum(w[]) # Superpopulation size
#---------------- Calculate Bayesian posterior predictive check
#for(t in 1:(K-1)){
# for(i in 1:M){
# for(s in 1:state){
# for(j in 1:n.surv)
# r[s, i, t, j] <- ifelse(y[i, t+1, j] == s, 1, 0)
# r.new[s, i, t, j] <- ifelse(y.new[i, t+1, j] == s, 1, 0)
# }
# }
# }
#}
#
#for(t in 1:(K-1)){
# for(s in 1:state){
# # sum across individuals in each state, each time period
# R_state[s, t] <- sum(r[s, , t, ])
# R_state.new[s, t] <- sum(r.new[s, , t, ])
# }
#}
#
#for(t in 1:(K-1)){
# for(s in 1:state){
# for(i in 1:M){
# for(j in 1:n.surv)
#
# muy[i, j, t, s] <- ps[z[i, t], i, t, z[i, t+1]] *
# po[z[i, t], i, t, j, s]
# }
# }
# PO_expt[s, t] <- sum(0.01, sum(muy[ , , t, s]))
# }
#}
#
##--------- Posterior predictive check
#
#for(t in 1:(K-1)){
# for(s in 1:state){
# E.act[s, t] <- pow(pow(R_state[s, t], 0.5) - pow(PO_expt[s, t], 0.5), #2)
# E.new[s, t] <- pow(pow(R_state.new[s, t], 0.5) - pow(PO_expt[s, t], 0#.5), 2)
# }
#}
#
#zzz.fit <- sum(E.act[,])
#zzz.fit.new <- sum(E.new[,])
}
",fill = TRUE)
sink()
}
# Analysis of the JS model as a multistate model
# Augment data approach allows us to estimate the total population size
# Idea = add a large number of 0 capture histories, and then the model will determine if those individuals were missed or do not exit using information from the detection probability and recruitment rates
nz <- 500
# Format the data with an added dummy column
# Dummy column = 1st column is all 1s
# Allows the model to calculate recrutiment into the population for the first season
CH.ms <- array(0, dim = c(dim(CH)[1]+nz, dim(CH)[2]+1))
for(i in 1:dim(CH)[1]){
for(j in 1:dim(CH)[2]){
CH.ms[i,j+1] <- CH[i, j]
}
}
CH.du <- CH.ms[1:dim(CH)[1],]
# Recode CH matrix: a 0 is not allowed in JAGS!
CH.ms[CH.ms==0] <- 3 # Not seen = 3, seen = 1 or 2
# Bundle data
jags.data <- list(y = CH.ms,
M = dim(CH.ms)[1],
K = dim(CH.ms)[2],
n.surv = 1,
state = 3
)
# Initial values
ch <- CH.du
js.multistate.init <- function(ch, nz){
# 4 state system
# 1 = Not entered
# 2 = uninfected
# 3 = infected
# 4 = Dead
# 3 observation states
# seen uninfected
# seen infected
# not seen
# Put an NA when an individual was not seen ( = 0 or 3)
ch[ch==0] <- NA
ch[ch==3] <- NA
state <- ch
colnames(state) <- 1:dim(state)[2]
# When the individual is known to be alive between first and last capture fill it in with 2s
for(i in 1:dim(ch)[1]){ # For each individual
n1 <- min(which(ch[i,] < 3))
n2 <- max(which(ch[i,] < 3))
fill <- which(is.na(state[i,n1:n2]) == TRUE)
state[i, names(fill)] <- 2
}
state <- state + 1
f <- array(NA, dim = c(dim(ch)[1]))
for(i in 1:dim(ch)[1]){
f[i] <- min(which(!is.na(ch[i, ])))
}
l <- array(NA, dim = c(dim(ch)[1]))
for(i in 1:dim(ch)[1]){
l[i] <- max(which(!is.na(ch[i, ])))
}
for (i in 1:dim(ch)[1]){
# Before initial observation- not observed
state[i,1:(f[i]-1)] <- 1
# If the last time the animal was seen != the last survey date
# Then add 1 to the survey date, and fill the rest to the last occasion with 4
if(l[i]!= dim(ch)[2]){state[i, (l[i]+1):dim(ch)[2]] <- 4}
if(ch[i, f[i]] == 1){state[i, f[i]] <- 2}
if(ch[i, f[i]] == 2){state[i, f[i]] <- 3}
}
state2 <- array(NA, dim = c(dim(ch)[1]+nz, dim(ch)[2]))
state3 <- array(NA, dim = c(dim(ch)[1]+nz, dim(ch)[2]))
# Copy over the matrix you generated to the one with the right dimensions
for(i in 1:dim(ch)[1]){
for(j in 1:(dim(ch)[2])){
state2[i,j] <- state[i, j]
}
}
# For all the data augmented individuals- their state == 1
for(i in (dim(state)[1]+1):dim(state2)[1]){
for(j in 1:(dim(state)[2])){ # For all occasions
state2[i,j] <- 1
}
}
return(state2)
}
n.occasions <- dim(CH.ms)[2]
zinit <- js.multistate.init(CH.du, nz)
zinit <- apply(zinit, c(1, 2), max)
zinit[,1] <- NA
#--- Bundle the inits
inits <- function(){list(phi_U = runif(1, 0.5, 1),
phi_I = runif(1, 0.5, 1),
p_U = runif(1, 0.5, 1),
p_I = runif(1, 0.5, 1),
beta_UI = runif(1, 0, 0.5),
beta_IU = runif(1, 0, 0.5),
gamma_U = rep(runif(1, 0, 0.5), times = n.occasions-1),
gamma_I = rep(runif(1, 0, 0.5), times = n.occasions-1),
z = zinit
)}
#------- Parameters monitored
params <- c("phi_U",
"phi_I",
"p_U",
"p_I",
"beta_UI",
"beta_IU",
"gamma_U",
"gamma_I")
#------- MCMC settings
ni <- 1000
nb <- 100
nt <- 10
nc <- 3
na <- 100
#------ call Library
library("jagsUI")
## Loading required package: lattice
##
## Attaching package: 'jagsUI'
## The following object is masked from 'package:utils':
##
## View
#------- Call JAGS from R
js.ms <- jags(data = jags.data, inits = inits, parameters.to.save = params, model.file = "model_JS.txt", n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, n.adapt = na, parallel = TRUE)
##
## Processing function input.......
##
## Done.
##
## Beginning parallel processing using 3 cores. Console output will be suppressed.
##
## Parallel processing completed.
##
## Calculating statistics.......
##
## Done.
print(js.ms, dig = 3)
## JAGS output for model 'model_JS.txt', generated by jagsUI.
## Estimates based on 3 chains of 1000 iterations,
## adaptation = 100 iterations (sufficient),
## burn-in = 100 iterations and thin rate = 10,
## yielding 270 total samples from the joint posterior.
## MCMC ran in parallel for 2.58 minutes at time 2018-05-10 10:39:56.
##
## mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
## phi_U 0.902 0.047 0.801 0.907 0.980 FALSE 1 1.037 58
## phi_I 0.821 0.046 0.725 0.822 0.901 FALSE 1 1.011 149
## p_U 0.657 0.092 0.490 0.656 0.854 FALSE 1 1.060 36
## p_I 0.578 0.059 0.467 0.574 0.697 FALSE 1 1.034 61
## beta_UI 0.419 0.065 0.292 0.415 0.554 FALSE 1 1.009 270
## beta_IU 0.100 0.035 0.043 0.096 0.183 FALSE 1 1.005 232
## gamma_U[1] 0.043 0.013 0.024 0.042 0.071 FALSE 1 1.006 270
## gamma_U[2] 0.028 0.011 0.010 0.026 0.054 FALSE 1 1.002 270
## gamma_U[3] 0.040 0.013 0.019 0.038 0.068 FALSE 1 1.030 70
## gamma_U[4] 0.060 0.015 0.036 0.059 0.093 FALSE 1 1.018 270
## gamma_U[5] 0.022 0.011 0.004 0.021 0.046 FALSE 1 1.004 270
## gamma_I[1] 0.043 0.011 0.024 0.042 0.067 FALSE 1 1.001 270
## gamma_I[2] 0.029 0.013 0.006 0.029 0.056 FALSE 1 1.025 185
## gamma_I[3] 0.031 0.014 0.005 0.030 0.061 FALSE 1 1.002 270
## gamma_I[4] 0.058 0.017 0.029 0.058 0.094 FALSE 1 1.021 118
## gamma_I[5] 0.030 0.016 0.004 0.028 0.072 FALSE 1 0.996 270
## deviance 753.818 66.771 625.875 758.226 872.954 FALSE 1 1.068 35
##
## Successful convergence based on Rhat values (all < 1.1).
## Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## For each parameter, n.eff is a crude measure of effective sample size.
##
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
##
## DIC info: (pD = var(deviance)/2)
## pD = 2112 and DIC = 2865.849
## DIC is an estimate of expected predictive error (lower is better).
plot(js.ms)
The data set we used was actually simulated! We know the truth!